OcrV1, Main, Exploration, bibRecord, 000F18

Word matching using single closed contours for indexing handwritten historical documents

Identifieur interne : 000F18 ( Main/Exploration ); précédent : 000F17; suivant : 000F19

Word matching using single closed contours for indexing handwritten historical documents

Auteurs : Tomasz Adamek [Irlande (pays)] ; Noel E. O'Connor [Irlande (pays)] ; Alan F. Smeaton [Irlande (pays)]

Source :

International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2007.

RBID : Pascal:07-0469287

Descripteurs français

Pascal (Inist)
- Indexation, Caractère manuscrit, Reconnaissance optique caractère, Reconnaissance caractère, Mot, Langage naturel, Analyse documentaire, Analyse image, Traitement image, Signal vidéo, Evaluation performance, Annotation, Extraction forme, Méthode échelle multiple, Segmentation.

English descriptors

KwdEn :
- Annotation, Character recognition, Document analysis, Image analysis, Image processing, Indexing, Manuscript character, Multiscale method, Natural language, Optical character recognition, Pattern extraction, Performance evaluation, Segmentation, Video signal, Word.

Abstract

Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL'04), pp. 278-287,2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O'Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature.

Affiliations:

Irlande (pays)

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000322
to stream PascalFrancis, to step Curation: 000464
to stream PascalFrancis, to step Checkpoint: 000260
to stream Main, to step Merge: 000F31
to stream Main, to step Curation: 000F18

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Word matching using single closed contours for indexing handwritten historical documents</title>
<author><name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">07-0469287</idno>
<date when="2007">2007</date>
<idno type="stanalyst">PASCAL 07-0469287 INIST</idno>
<idno type="RBID">Pascal:07-0469287</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000322</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000464</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000260</idno>
<idno type="wicri:doubleKey">1433-2833:2007:Adamek T:word:matching:using</idno>
<idno type="wicri:Area/Main/Merge">000F31</idno>
<idno type="wicri:Area/Main/Curation">000F18</idno>
<idno type="wicri:Area/Main/Exploration">000F18</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Word matching using single closed contours for indexing handwritten historical documents</title>
<author><name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Centre for Digital Video Processing, Dublin City University</s1>
<s2>Dublin</s2>
<s3>IRL</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Irlande (pays)</country>
<wicri:noRegion>Dublin</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2007">2007</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Annotation</term>
<term>Character recognition</term>
<term>Document analysis</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Indexing</term>
<term>Manuscript character</term>
<term>Multiscale method</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern extraction</term>
<term>Performance evaluation</term>
<term>Segmentation</term>
<term>Video signal</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Indexation</term>
<term>Caractère manuscrit</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Analyse documentaire</term>
<term>Analyse image</term>
<term>Traitement image</term>
<term>Signal vidéo</term>
<term>Evaluation performance</term>
<term>Annotation</term>
<term>Extraction forme</term>
<term>Méthode échelle multiple</term>
<term>Segmentation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL'04), pp. 278-287,2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O'Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature.</div>
</front>
</TEI>
<affiliations><list><country><li>Irlande (pays)</li>
</country>
</list>
<tree><country name="Irlande (pays)"><noRegion><name sortKey="Adamek, Tomasz" sort="Adamek, Tomasz" uniqKey="Adamek T" first="Tomasz" last="Adamek">Tomasz Adamek</name>
</noRegion>
<name sortKey="O Connor, Noel E" sort="O Connor, Noel E" uniqKey="O Connor N" first="Noel E." last="O'Connor">Noel E. O'Connor</name>
<name sortKey="Smeaton, Alan F" sort="Smeaton, Alan F" uniqKey="Smeaton A" first="Alan F." last="Smeaton">Alan F. Smeaton</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F18 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000F18 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:07-0469287
   |texte=   Word matching using single closed contours for indexing handwritten historical documents
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Word matching using single closed contours for indexing handwritten historical documents

Word matching using single closed contours for indexing handwritten historical documents

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri